Final Presentation

Group 3

December 4, 2017

Group 3 Objective: Association between dependent and independent variables

The objectives were to create summaries and visualizations of how the dependent variable is associated with the different independent variables. Our goal was to develop different models to analyze these associations in the data. Our work included:

Rationale:

<<<<<<< HEAD ## Idea development:

We explored various models to see if there were associations detected between independent and dependent variables. It was challenging to make strong conclusions about association in this analysis because the dataset provided at this stage was small and did not include many drugs, however, we did discover similar patterns with certain in vitro or in vivo tests and ELU. We also explored changes in these associations by different drug doseage, and interestingly drug dose appears to be an effect modifier on some of the relationships, so that will be interest to follow up on as more data is provided.

Idea development:

Room for errors:

Room for errors (continued):

Next steps:

-The next step is to include more data representing more drugs

-Validating the models (for RandomForest and Lasso) would help us understand the predictive power of the model to determine drug efficacy. The data can be subsetted and tested.

-A function could be written that runs all of the models and outputs the coefficients

Functions, Get Ready!!!:

Image retrieved from: https://www.pinterest.com/pin/233624299389735946/

Function 1: Fitting Linear Regression Model

Linear Regression Function Code

linear_model <- function(peak_trough, dep_var, 
                      data = efficacy_summary) {
  function_data <- data %>% 
    filter(level == peak_trough) %>% 
    gather(key = independent_var, value = indep_measure, 
           -drug, -dosage, -dose_int, -level, -ELU, -ESP, 
           na.rm = TRUE) %>% 
    select(drug, dosage, dose_int, level, dep_var, 
           indep_measure, independent_var) 
  
if(dep_var=="ELU") 
  {function_data$vect <- function_data$ELU}
if(dep_var=="ESP") 
  {function_data$vect <- function_data$ESP}

  model_function <- function(data) {
    model_results <- lm(vect ~ scale(indep_measure), 
                        data = data)      
    }

Function Code, Continued

  estimate_results <- function_data %>% 
    group_by(independent_var, dose_int) %>% 
    nest() %>% 
    mutate(mod_results = purrr::map(data, 
                            model_function)) %>% 
    mutate(mod_coefs = purrr::map(mod_results, 
                            broom::tidy)) %>% 
    select(independent_var, dose_int, mod_results, 
                            mod_coefs) %>% 
    unnest(mod_coefs) %>% 
    filter(term == "scale(indep_measure)")

Linear Model Function Code, Continued

  coef_plot <- estimate_results %>%
    mutate(independent_var = forcats::fct_reorder(
        independent_var, estimate, fun = max)) %>%
    rename(Dose_Interval = dose_int) %>% 
    ggplot(aes(x = estimate, y = independent_var, 
              color = Dose_Interval)) +
    geom_point(aes(size = 1 / std.error)) +
    scale_size_continuous(guide = FALSE) +
    theme_few() + 
    ggtitle(label = "Linear model coefficients as function 
            of independent variables, \n by drug dose and 
            model uncertainty", subtitle = "Smaller points 
            have more uncertainty than larger points") +
    geom_vline(xintercept = 0, color = "cornflower blue") 
  
  coef_plot
}

Linear Model Function, Input Parameters:

Linear Model- Visualize independent variable coefficients - ELU

#Sample code for function, linear_model (Cmax and ELU)
linear_model(peak_trough = "Cmax", dep_var = "ELU")

Linear Model- Visualize independent variable coefficients - ESP

#Sample code for function, linear_model (Cmax and ESP)
linear_model(peak_trough = "Cmax", dep_var = "ESP")

Linear Model Interpretation

Regression Tree Function

Regression Tree Function code

rpart(ELU ~  drug + dosage + level + 
      plasma + `Uninvolved lung` + `Rim (of Lesion)` + 
      `Outer Caseum` + `Inner Caseum` + 
        `Standard Lung` + `Standard Lesion` + cLogP + 
        `Human Plasma Binding` + 
        `Mouse Plasma Binding` + `MIC Erdman Strain` + 
      `MIC Erdman Strain with Serum` + 
        `MIC rv strain` + `Caseum binding` + 
        `Macrophage Uptake (Ratio)`,
      data = function_data, 
      control = rpart.control(cp = -1, 
                              minsplit = min_split, 
                             minbucket = min_bucket))

Regression Tree Function input parameters

regression_tree(dep_var = "ELU", min_split = 8, 
                min_bucket = 6)

Regression Tree Function example (ELU)

Regression Tree Function interpretation

Regression Tree Function example (ESP)

LASSO Function

Least Absolute Shrinkage Selector Operator

Background We want to predict our outcome using the varibles we have in front of us; it is the next generation of step-wise regression anf can handle more varaibles than samples.

Example

LASSO Function

Least Absolute Shrinkage Selector Operator

LASSO Function part 1 preparing the data

LASSO_model <- function(dep_var, dose, df = efficacy_summary) {
  data <- na.omit(df) %>% 
  select_if(is.numeric) %>%
  filter(dosage == dose)

response <- df %>% 
  select(dep_var)

predictors <- df %>%
  select(c("PLA", "ULU", "RIM", "OCS", "ICS", "SLU", "SLE", "cLogP",
           "huPPB", "muPPB", "MIC_Erdman", 'MICserumErd',
           "MIC_Rv", "Caseum_binding", "MacUptake"))

y <- as.numeric(unlist(response))
x <- as.matrix(predictors)

LASSO Function part 2, glmnet

fit = glmnet(x, y)

coeff <- coef(fit,s=0.1)
coeff <- as.data.frame(as.matrix(coeff))
}

Testing LASSO function:

LASSO_model(dep_var = "ELU", dose = 50)
predictor coeff
(Intercept) 1.2911027
cLogP 0.2908215
muPPB 0.0049209

Interpretation:

RandomForest Function

efficacy.rf <- randomForest( ELU~ ., data =dataset,
              na.action = na.roughfix,
                        ntree= 500, 
                        importance = TRUE)

Testing Random Forest

Interpretation

Room for errors:

These functions may be prone to several errors if:

Room for errors (continued):

Next steps: